Method and apparatus for pipelined joint equalization and decoding for gigabit communications

ABSTRACT

A method and apparatus for the implementation of reduced state sequence estimation is disclosed. , with an increased throughput using precomputation (look-ahead), with only a linear increase in hardware complexity with respect to the look-ahead depth. The present invention limits the increase in hardware complexity by taking advantage of past decisions (or survivor symbols). The critical path of a conventional RSSE implementation is broken up into at least two smaller critical paths using pipeline registers. Various reduced state sequence estimation implementations are disclosed that employ one-step or multiple-step look-ahead techniques to process a signal received from a dispersive channel having a channel memory.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/245,519, filed Nov. 3, 2000.

FIELD OF THE INVENTION

[0002] The present invention relates generally to channel equalizationand decoding techniques, and more particularly, to sequence estimationtechniques with shorter critical paths.

BACKGROUND OF THE INVENTION

[0003] The transmission rates for local area networks (LANs) that useunshielded twisted pair (UTP) copper cabling have progressivelyincreased from 10 Megabits-per-second (Mbps) to 1 Gigabit-per-second(Gbps). The Gigabit Ethernet 1000 Base-T standard, for example, operatesat a clock rate of 125 MHz and uses UTP cabling of Category 5 with fourpairs to transmit 1 Gbps. Trellis-coded modulation (TCM) is employed bythe transmitter, in a known manner, to achieve coding gain. The signalsarriving at the receiver are typically corrupted by intersymbolinterference (ISI), crosstalk, echo, and noise. A major challenge for1000 Base-T receivers is to jointly equalize the channel and decode thecorrupted trellis-coded signals at the demanded clock rate of 125 MHz,as the algorithms for joint equalization and decoding incorporatenon-linear feedback loops that cannot be pipelined.

[0004] Data detection is often performed using maximum likelihoodsequence estimation, to produce the output symbols or bits. A maximumlikelihood sequence estimator considers all possible sequences anddetermines which sequence was actually transmitted, in a known manner.The maximum likelihood sequence estimator is the optimum decoder andapplies the well-known Viterbi algorithm to perform joint equalizationand decoding. For a more detailed discussion of a Viterbi implementationof a maximum likelihood sequence estimator (MLSE), see Gerhard Fettweisand Heinrich Meyr, “High-Speed Parallel Viterbi Decoding Algorithm andVLSI-Architecture,” IEEE Communication Magazine (May 1991), incorporatedby reference herein.

[0005] In order to reduce the hardware complexity for the maximumlikelihood sequence estimator that applies the Viterbi algorithm, anumber of sub-optimal approaches which are referred to as reduced-statesequence estimation (RSSE) have been proposed. For a discussion ofreduced state sequence estimation techniques, as well as the specialcases of decision-feedback sequence estimation (DFSE) and paralleldecision-feedback decoding (PDFD) techniques, see, for example, P. R.Chevillat and E. Eleftheriou, “Decoding of Trellis-Encoded Signals inthe Presence of Intersymbol Interference and Noise”, IEEE Trans.Commun., vol. 37, 669-76, (July 1989), M. V. Eyuboglu and S. U. H.Qureshi, “Reduced-State Sequence Estimation For Coded Modulation OnIntersymbol Interference Channels”, IEEE JSAC, vol. 7, 989-95 (August1989), or A. Duel-Hallen and C. Heegard, “Delayed Decision-FeedbackSequence Estimation,” IEEE Trans. Commun., vol. 37, pp. 428-436, May1989, each incorporated by reference herein.

[0006] Generally, reduced state sequence estimation techniques reducethe complexity of the maximum likelihood sequence estimators by mergingseveral states. The RSSE technique incorporates non-linear feedbackloops that cannot be pipelined. The critical path associated with thesefeedback loops is the limiting factor for high-speed implementations.

[0007] U.S. patent application Ser. No. 09/326,785, filed Jun. 4, 1999and entitled “Method and Apparatus for Reducing the ComputationalComplexity and Relaxing the Critical Path of Reduced State SequenceEstimation (RSSE) Techniques,” incorporated by reference herein,discloses a technique that reduces the hardware complexity of RSSE for agiven number of states and also relaxes the critical path problem. U.S.patent application Ser. No. 09/471,920, filed Dec. 23, 1999, entitled“Method and Apparatus for Shortening the Critical Path of ReducedComplexity Sequence Estimation Techniques,” incorporated by referenceherein, discloses a technique that improves the throughput of RSSE bypre-computing the possible values for the branch metrics in a look-aheadfashion to permit pipelining and the shortening of the critical path.The complexity of the pre-computation technique, however, increasesexponentially with the length of the channel impulse response. Inaddition, the delay through the selection circuitry that selects theactual branch metrics among all precomputed ones increases with L,eventually neutralizing the speed gain achieved by the precomputation.

[0008] A need therefore exists for a technique that increases thethroughput of RSSE algorithms using precomputations with only a linearincrease in hardware complexity with respect to the look-aheadcomputation depth.

SUMMARY OF THE INVENTION

[0009] Generally, a method and apparatus are disclosed for theimplementation of reduced state sequence estimation with an increasedthroughput using precomputations (look-ahead), while only introducing alinear increase in hardware complexity with respect to the look-aheaddepth. RSSE techniques typically decode a received signal and compensatefor intersymbol interference using a decision feedback unit (DFU), abranch metrics unit (BMU), an add-compare-select unit (ACSU) and asurvivor memory unit (SMU). The present invention limits the increase inhardware complexity by taking advantage of past decisions. The pastdecision may be a past ACS decision of the ACSU or a past survivorsymbol in the SMU or a combination thereof. The critical path of aconventional RSSE implementation is broken up into at least two smallercritical paths using pipeline registers.

[0010] A reduced state sequence estimator is disclosed that employs aone-step look-ahead technique to process a signal received from adispersive channel having a channel memory. Initially, a speculativeintersymbol interference estimate is precomputed based on a combinationof (i) a speculative partial intersymbol interference estimate for afirst postcursor tap of the channel impulse response, based on eachpossible value for a data symbol, and (ii) a combination of partialintersymbol interference estimates for each subsequent postcursor tap ofthe channel impulse response, where at least one of the partialintersymbol interference estimates for the subsequent postcursor taps isbased on a past survivor symbol from the corresponding state. Inaddition, a branch metric is precomputed based on the precomputedintersymbol interference estimate. One of the precomputed branch metricsis selected based on a past decision from the corresponding state. Thepast decision may be a past ACS decision of the ACSU or a past survivorsymbol in the SMU or a combination of both. The selected branch metricis used to compute new path metrics for path extensions from acorresponding state. The computed new path metrics are used to determinethe best survivor path and path metric for a corresponding state.

[0011] A reduced state sequence estimator is also disclosed that employsa multiple-step look-ahead technique to process a signal received from adispersive channel having a channel memory. Initially, a speculativepartial intersymbol interference estimate is precomputed for each of aplurality of postcursor taps of the channel impulse response, based oneach possible value for a data symbol. Thereafter, a partial intersymbolinterference estimate is selected for each of the plurality ofpostcursor taps other than a first postcursor tap based on a pastdecision from a corresponding state. The past decision may be a past ACSdecision of the ACSU or a past survivor symbol in the SMU or acombination of both. A precomputed partial intersymbol interferenceestimate for the first postcursor tap is referred to as a precomputedintersymbol interference estimate. In addition, speculative branchmetrics are precomputed based on the precomputed intersymbolinterference estimates. One of the precomputed branch metrics isselected based on a past decision from a corresponding state. The pastdecision may be a past ACS decision of the ACSU or a past survivorsymbol in the SMU or a combination of both. The selected branch metricis used to compute new path metrics for path extensions from acorresponding state. The computed new path metrics are used to determinethe best survivor path and path metric for a corresponding state.

[0012] In further variations, intersymbol estimates can be selectedamong precomputed intersymbol interference estimates withoutprecomputing branch metrics or the partial intersymbol interferenceestimates can be precomputed for a group of taps, with a precomputationfor all possible data symbol combinations corresponding to the groups oftaps and selection for each group.

[0013] A more complete understanding of the present invention, as wellas further features and advantages of the present invention, will beobtained by reference to the following detailed description anddrawings.

BRIEF DESCRIPTION OF THE DRAWING

[0014]FIG. 1 illustrates a channel impulse response with channel memory,L;

[0015]FIG. 2 illustrates a communication system in which the presentinvention may operate;

[0016]FIG. 3 illustrates a trellis associated with a channel of memorylength L=1 and binary data symbols;

[0017]FIG. 4 illustrates a block diagram for an implementation of theViterbi algorithm (VA);

[0018]FIG. 5 illustrates a state-parallel implementation of the ACSU ofFIG. 4 for a channel of memory L=1;

[0019]FIG. 6 is a table analyzing the complexity and critical path ofMLSE and RSSE techniques;

[0020]FIG. 7A illustrates the architecture of a reduced state sequenceestimator;

[0021]FIG. 7B illustrates an implementation of the look-up tables in theBMU of FIG. 7A;

[0022]FIG. 8 illustrates an exemplary look-ahead architecture for anRSSE algorithm with one-step look-ahead in accordance with oneembodiment of the present invention;

[0023]FIG. 9 illustrates an exemplary look-ahead architecture for anRSSE algorithm with multiple-step look-ahead in accordance with anotherembodiment of the present invention;

[0024]FIG. 10 is a table analyzing the complexity and critical path of apipelined RSSE in accordance with the present invention;

[0025]FIGS. 11 and 12 illustrate alternate implementations of the RSSEalgorithms with one-step look-ahead (FIG. 8) and multiple-steplook-ahead (FIG. 9), respectively;

[0026]FIG. 13 illustrates a trellis for a multi-dimensional trelliscode, such as the 1000BASE-T trellis code;

[0027]FIG. 14 is a schematic block diagram illustrating a pipelinedparallel decision feedback decoder (PDFD) architecture that decodes the1000BASE-T trellis code and equalizes intersymbol interference inaccordance with the present invention;

[0028]FIG. 15 is a schematic block diagram illustrating an embodiment ofthe look-ahead decision feedback unit (LA-DFU) of FIG. 14;

[0029]FIG. 16 is a schematic block diagram illustrating an embodiment ofthe intersymbol interference selection unit (ISI-MUXU) of FIG. 14;

[0030]FIG. 17 is a schematic block diagram illustrating an embodiment ofthe one dimensional look-ahead branch metrics unit (ID-LA-BMU) of FIG.14; and

[0031]FIG. 18 is a schematic block diagram illustrating an embodiment ofthe survivor memory unit (SMU) of FIG. 14.

DETAILED DESCRIPTION

[0032] As previously indicated, the processing speed of conventionalreduced state sequence estimation (RSSE) implementations is limited by arecursive feedback loop. According to one feature of the presentinvention, the processing speed of reduced state sequence estimationimplementations is improved by pipelining the branch metric anddecision-feedback computations, such that the critical path is reducedto be of the same order as in a traditional Viterbi decoder. Theadditional hardware required by the present invention scales onlylinearly with the look-ahead depth. The presented algorithm allows theVLSI implementation of RSSE for high-speed applications such as GigabitEthernet over copper. Reduced complexity sequence estimation techniquesare disclosed for uncoded signals, where the underlying trellis has noparallel state transitions, as well as for signals encoded with amulti-dimensional trellis code having parallel transitions, such assignals encoded according to the 1000BASE-T Ethernet standard. It shouldbe understood that the disclosed pipelining technique can be appliedwhenever RSSE is being used, e.g., to any kind of trellis or modulationsscheme. The disclosed examples are used for illustration purposes onlyand do not intend to limit the scope of the invention.

System Model

[0033]FIG. 1 illustrates a channel impulse response with channel memory,L. As shown in FIG. 1, there is a main tap corresponding to time 0, andthere are L postcursor taps. The first K postcursor taps shown in FIG. 1after the main tap are used for the construction of the reduced-statetrellis, as discussed below.

[0034]FIG. 2 illustrates a communication system 200 having a channel 210and a sequence estimator 220. The output of the channel 210 at time n isgiven by $\begin{matrix}{{z_{n} = {{\sum\limits_{i = 0}^{L}{f_{i} \cdot a_{n - i}}} + w_{n}}},} & (1)\end{matrix}$

[0035] where {f_(i)}, 0≦i≦L are the finite impulse response channelcoefficients (f₀=1 is assumed without loss of generality), L is thechannel memory, a_(n) is the data symbol at time n, and w_(n), iszero-mean Gaussian noise. The decision of the sequence estimator 220corresponding to a_(n) is denoted by a_(n)′. While the illustrativeembodiment assumes that the symbols are binary, i.e., a_(n)={−1,1}, andtrellis-coded modulation (TCM) is not employed. The present inventionmay be applied, however, to non-binary modulation and TCM, such as thecoding and modulation scheme used in Gigabit Ethernet over copper, aswould be apparent to a person of ordinary skill in the art.

[0036] The optimum method for the recovery of the transmitted symbols isMLSE, which applies the Viterbi algorithm (VA) to the trellis defined bythe channel state

ρ_(n)=(a _(n−1) ,a _(n−2) , . . . ,a _(n−L)).  (2)

[0037] A binary symbol constellation is assumed. Thus, the number ofstates processed by the VA is given by:

S=2^(L),  (3)

[0038] and two branches leave or enter each state of the trellis. FIG. 3shows a trellis 300 associated with a channel of memory length L=1. Thebranch metric for a transition from state ρ_(n) under input a_(n) isgiven by $\begin{matrix}{{\lambda_{n}\left( {z_{n},a_{n},\rho_{n}} \right)} = {\left( {z_{n} - a_{n} - {\sum\limits_{i = 1}^{L}{f_{i}a_{n - i}}}} \right)^{2}.}} & (4)\end{matrix}$

[0039] The VA determines the best survivor path into state ρ_(n+1) fromthe two predecessor states {ρ_(n)} by evaluating the followingadd-compare-select (ACS) function: $\begin{matrix}{{{\Gamma_{n + 1}\left( \rho_{n + 1} \right)} = {\min\limits_{{\{\rho_{n}\}}->\rho_{n + 1}}\left( {{\Gamma_{n}\left( \rho_{n} \right)} + {\lambda_{n}\left( {z_{n},a_{n},\rho_{n}} \right)}} \right)}},} & (5)\end{matrix}$

[0040] where Γ_(n)(ρ_(n)) is the path metric for state ρ_(n).

[0041] The block diagram for an implementation of the VA is shown inFIG. 4. As shown in FIG. 4, the VA 400 includes a branch metrics unit(BMU) 410, an add-compare-select unit (ACSU) 420 and a survivor memoryunit (SMU) 430. The BMU 410 calculates the 2^(L+1) branch metrics (BMs),the ACSU 420 performs the ACS operation for each of the S states, andthe SMU 430 keeps track of the S survivor paths.

[0042] The ACSU 420 is the bottleneck for maximum throughput as theoperations in the BMU 410 and SMU 430 are feedforward and can thus bepipelined using pipeline registers 415 and 425. A state-parallelimplementation of the ACSU 420 yields the highest processing speed andis shown in FIG. 5 for a channel of memory L=1 (the correspondingtrellis was shown in FIG. 3).

[0043] The recursive loop of the ACS operation associated with equation(5) determines the critical path in the ACSU 420, as it cannot bepipelined. It can be seen from FIG. 5 that this loop comprises oneaddition (ADD) 510, one 2-way comparison 520, whose delay is about thesame as one ADD, and a 2-way selection 530, corresponding to a 2-to-1multiplexer (MUX). Hereinafter, shift registers will not be consideredin the critical path analysis due to their minor delay. FIG. 6 is atable 600 analyzing the complexity and critical path of MLSE and RSSE.Column 620 of table 600 summarizes the computational complexity andcritical path of MLSE for binary signals corrupted by a channel ofmemory L. It is noted that in addition to a state-parallelimplementation shown in FIG. 5, the throughput of the VA can be evenfurther increased by introducing parallelism on the bit, block andalgorithmic level (for a good summary, see e.g. H. Meyr, M. Moeneclaey,and S. A. Fechtel, Digital Communication Receivers, John Wiley & Sons,pp. 568-569, 1998). However, this comes at a significant increase incomplexity and/or latency.

Reduced-state Sequence Estimation

[0044] RSSE reduces the complexity of MLSE by truncating the channelmemory ρ_(n), as described in A. Duel-Hallen and C. Heegard, “DelayedDecision-Feedback Sequence Estimation,” IEEE Trans. Commun., vol. 37,428-436, May 1989, or applying set partitioning to the signal alphabetas described in P. R. Chevillat and E. Eleftheriou, “Decoding ofTrellis-Encoded Signals in the Presence of intersymbol Interference andNoise,” IEEE Trans. Commun., vol. 37, pp. 669-676, Jul. 1989 or M. V.Eyuboglu and S. U. Qureshi, “Reduced-State Sequence Estimation for CodedModulation on Intersymbol Interference Channels,” IEEE J. Sel. AreasCommun., vol. 7, pp. 989-995, Aug. 1989. Similar to the VA, RSSEsearches for the most likely data sequence in the reduced trellis bykeeping only the best survivor path for each reduced state. In theexemplary embodiment discussed herein, the reduced state ρ_(n)′ isobtained by truncating equation (2) to K yielding

ρ_(n)′=(a _(n−1) ,a _(n−2) , . . . ,a _(n−K)),0≦K≦L.  (6)

[0045] In this case, the number of reduced states is given by

S′=2^(K).  (7)

[0046] The results may be generalized to the cases given in P. R.Chevillat and E. Eleftheriou or M. V. Eyuboglu and S. U. Qureshi,referenced above. The branch metric for a transition from reduced stateρ_(n)′ under input a_(n) is given by

λ_(n)′(z _(n) ,a _(n) ,ρ _(n)′)=(z _(n) −a _(n) +u _(n)(ρ_(n)′))²,  (8)

[0047] where $\begin{matrix}{{u_{n}\left( \rho_{n}^{\prime} \right)} = {{- {\sum\limits_{i = 1}^{K}{f_{i}a_{n - i}}}} - {\sum\limits_{i = {K + 1}}^{L}{f_{i}{{{\hat{a}}_{n - i}\left( \rho_{n}^{\prime} \right)}.}}}}} & (9)\end{matrix}$

[0048] In equation (9), u_(n)(ρ_(n)′) is the decision-feedback forρ_(n)′ and â_(n−i)(ρ_(n)′) is the symbol of the survivor path into stateρ_(n)′ which corresponds to time n−i. As the first K survivor symbols(â_(n−1)(ρ_(n)′),â_(n−2)(ρ_(n)′), . . . ,â_(n−K)(ρ_(n)′)) from thesurvivor path into state ρ_(n)′ are equal to the symbols(a_(n−1),a_(n−2), . . . ,a_(n−K)) defining this state, equation (9) canbe rewritten as $\begin{matrix}{{u_{n}\left( \rho_{n}^{\prime} \right)} = {- {\sum\limits_{i = 1}^{L}{f_{i}{{{\hat{a}}_{n - i}\left( \rho_{n}^{\prime} \right)}.}}}}} & (10)\end{matrix}$

[0049] Among all paths entering reduced state ρ′_(n+1) from the 2predecessor states {_(n)′}, the most likely path with metricΓ′_(n+1)(ρ′_(n+1)′) is chosen according to the ACS operation:$\begin{matrix}{{\Gamma_{n + 1}^{\prime}\left( \rho_{n + 1}^{\prime} \right)} = {\min\limits_{{\{\rho_{n}^{\prime}\}}->\rho_{n + 1}^{\prime}}{\left( {{\Gamma_{n}^{\prime}\left( \rho_{n}^{\prime} \right)} + {\lambda_{n}^{\prime}\left( {z_{n},a_{n},\rho_{n}^{\prime}} \right)}} \right).}}} & (11)\end{matrix}$

[0050] The state-parallel architecture for RSSE with the parameters L=4and K=1 is shown in FIG. 7A. It can be seen from FIG. 7A that the RSSE700 architecture comprises four functional blocks, namely, a decisionfeedback unit (DFU) 710, a branch metrics unit (BMU) 720, anadd-compare-select unit (ACSU) 730 and a survivor memory unit (SMU) 740.As the corresponding reduced trellis is the same as the one in FIG. 3,the ACSU 730 shown in FIG. 7 has the same architecture as the ACSU 420given in FIG. 5. The part of the SMU 740 that stores the L-K survivorsymbols (â_(n−K−1)(ρ_(n)′),â_(n−K−2)(ρ_(n)′), . . . ,â_(n−L)(ρ_(n)′))for each reduced state must be implemented in aregister-exchange-architecture as described in R. Cypher and C. B.Shung, “Generalized trace-back techniques for survivor memory managementin the Viterbi algorithm,” J. VLSI Signal Processing, vol. 5, pp. 85-94,1993, as these symbols are required for the evaluation of equation (9)in the DFU 710 without delay. Because of the binary modulation, themultipliers in the DFU 710 can be implemented using shifters (SHIFTs).Look-up tables (LUTs) approximate the squaring function in equation (8)in the BMU, as defined by FIG. 7B.

[0051] RSSE 700 has less computational complexity than MLSE for the samechannel memory L, as RSSE processes less states, at the expense of asignificantly longer critical path. It can be seen from FIG. 7 thatthere is a recursive loop which comprises one SHIFT and L−K+1 ADDs inthe DFU 710 (the first term in the right hand side of equation (9) canbe computed outside the loop), one LUT in the BMU 720, one add-comparein the ACSU 730 (which is roughly equal to two ADDs in terms of delay),and a 2-to-1 MUX in the SMU 740. All these operations must be completedwithin one symbol period and cannot be pipelined. In contrast to this,the critical path in MLSE just comprises the ACS operation. Also, due tothe different structure of the recursive loop in the RSSE 700, the blockprocessing methods which have been developed to speed up the VA (see H.Meyr et al., Digital Communication Receivers, John Wiley & Sons, 568-569(1998)) cannot be applied to increase the throughput of RSSE. Therefore,the maximum throughput of RSSE is potentially significantly lower thanof MLSE. Furthermore, the throughput of RSSE depends on the channelmemory such that it decreases for increasing L. FIG. 6 summarizes thecomparison of MLSE and RSSE in terms of computational complexity andcritical path.

Pipelined RSSE

[0052] It was suggested in E. F. Haratsch and K. Azadet, “High-speedreduced-state sequence estimation,” Proc. IEEE Int. Symp. Circuits andSystems, May 2000, to precompute the branch metrics for all possible2^(L) channel states ρ_(n) in a look-ahead fashion outside the criticalloop. At each decoding step, the appropriate branch metrics are chosenbased on past survivor symbols in the SMU. This approach removes the BMUand DFU out of the critical loop. However, the hardware increasesexponentially with the channel memory L. Also the delay through theMUXs, which select the actual branch metrics among all precomputed ones,increases with L, eventually neutralizing the speed gain achieved by theprecomputation. The present invention provides a technique thatincreases the throughput of RSSE by performing precomputations whileonly leading to a linear increase in hardware with respect to thelook-ahead depth.

[0053] One-Step Look-Ahead

[0054] The hardware increase can be limited by taking advantage of pastsurvivor symbols in the SMU and past decisions of the ACSU. This will beshown for precomputations with look-ahead depth one, i.e. possiblevalues for branch metrics needed by the ACSU at time n are alreadycomputed at time n−1.

[0055] A partial decision-feedback for reduced state ρ_(n)′ could becalculated by using the L−1 survivor symbols(â_(n−2)(ρ_(n−1)′),â_(n−3)(ρ_(n−1)′), . . . ,â_(n−L)(ρ_(n−1)′))corresponding to the survivor sequence into ρ_(n−1)′: $\begin{matrix}{{v_{n}\left( \rho_{n - 1}^{\prime} \right)} = {- {\sum\limits_{i = 2}^{L}{f_{i}{{{\hat{a}}_{n - i}\left( \rho_{n - 1}^{\prime} \right)}.}}}}} & (12)\end{matrix}$

[0056] Note, that the K survivor symbols(â_(n−2)(ρ_(n)′),â_(n−3)(ρ_(n−1)′), . . . ,ân−K−1(ρ_(n−1))) need not tobe fed back from the SMU, as they are equal to the symbols defining thestate ρ_(n−1) (c.f. equation (6)). Therefore, these symbols and theircontribution to the partial decision-feedback v_(n)(ρ_(n−1)′) are fixedfor a particular state ρ_(n−1).

[0057] If ã_(n−1) denotes a possible extension of the sequence(â_(n−2)(ρ_(n−1)′),â_(n−3)(ρ_(n−1)′), . . . ,â_(n−L)(ρ_(n−1)′)), thecorresponding tentative decision-feedback is given by

ũ _(n)(ρ_(n−1) ′,ã _(n−1))=v _(n)(ρ_(n−1)′)−f ₁ ã _(n−1),  (13)

[0058] and the tentative branch metric under input a_(n) is

{tilde over (λ)}_(n)′(z _(n) ,a _(n),ρ_(n−1) ′,ã _(n−1))=(z _(n) −a _(n)+ũ _(n)(ρ_(n−1) ′,ã _(n−1)))².  (14)

[0059] The actual branch metric corresponding to survivor paths intostate ρhd n′ and input a_(n) can be selected among the tentative branchmetrics based on the past decision d_(n−1)(ρ_(n−1)′=ρ_(n)′) according to

λ_(n)′(z _(n) ,a _(n),ρ_(n)′)=sel(Λ_(n)(z _(n) ,a _(n),ρ_(n)′),d_(n−1)(ρ_(n−1=ρ) _(n)′)),   (15)

[0060] where Λ_(n)(z_(n),a_(n),ρ_(n)′) is the vector containing the twotentative branch metrics {tilde over(λ)}_(n)′(z_(n),a_(n),ρ_(n−1)′,ã_(n−1)) for input a_(n) and the twopossible sequences into ρ_(n)′ from the different predecessor states{ρ_(n−1)′}:

Λ_(n)(z _(n) , a _(n),ρ_(n)′)={{tilde over (λ)}_(n)′(Z _(n) ,a_(n),ρ_(n−1) ′,ã _(n−1))},{ρ_(n−1)}→ρ_(n)′.  (16)

[0061] The branch metrics, which have been selected using equation (15),are used for the ACS operation according to equation (11). As equations(12), (13), (14), (15) and (16) can already be evaluated at time n−1,they are decoupled from the ACS operation according to equation (11) attime n. This leads to an architecture that can achieve a potentiallyhigher throughput than the conventional RSSE implementation. Thelook-ahead architecture for the RSSE 700 of FIG. 7 (i.e. L=4 and K=1) isshown in FIG. 8. It can be seen that the long critical path in thearchitecture of FIG. 7 is broken up into two smaller critical paths, asa pipeline stage 825 is placed in front of the ACSU 830. The processingspeed of this architecture still depends on the channel memory, as thenumber of additions and thus the delay along the critical path in theDFU 810 increases with L. In the following, a pipelined RSSEarchitecture is discussed whose maximum throughput does not depend on L.

[0062] Multiple-step Look-ahead

[0063] The process of precomputing branch metrics which are needed attime n could already be started at time n−M, where Mε[1;L−K]. A partialdecision-feedback corresponding to the survivor sequence(â_(n−M−1)(ρ_(n−M)′),â_(n−M−2)(ρ_(n−M)′), . . . ,â_(n−L)(ρ_(n−M)′)) intoρ_(n−M)′ is given by $\begin{matrix}{{v_{n}\left( \rho_{n - M}^{\prime} \right)} = {- {\sum\limits_{i = {M + 1}}^{L}{f_{i}{{{\hat{a}}_{n - i}\left( \rho_{n - M}^{\prime} \right)}.}}}}} & (17)\end{matrix}$

[0064] It is again noted that the K survivor symbols(â_(n−M−1)(ρ_(n−M)′),â_(n−M−2)(ρ_(n−M)′), . . . ,â_(n−M−K)(ρ_(n−M)′))are identical to the symbols defining the state ρ_(n−M)′, and thus theircontribution to v_(n)(ρ_(n−M)′) is fixed for this particular state. Atentative partial decision-feedback for a sequence starting with(â_(n−M−1)(ρ_(n−M)′),â_(n−M−2)(ρ_(n−M)′), . . . ,â_(n−L)(ρ_(n−M)′)) andwhich is extended by ã_(n−M) can be precomputed as

ũ _(n)(ρ_(n−M) ′,ã _(n−M))=v _(n)(ρ_(n−M)′)−f _(M) ã _(n−M).  (18)

[0065] When the decision d_(n−M)(ρ_(n−M)′=ρ_(n−M+1)′) becomes available,the partial decision-feedback, which corresponds to the survivorsequence (â_(n−M)(ρ_(n−M+1)′),â_(n−M−1)(ρ_(n−M+1)′), . . . ,â_(n−L)(ρhdn−M+1′)), can be selected among the precomputed ones:

v _(n)(ρ_(n−M+1)′)=sel(U _(n)(ρ_(n−M+1)′),d_(n−M)(ρ_(n−M)′=ρ_(n−M+1)′)),  (19)

[0066] where U_(n)(ρ_(n−M+1)′) is the vector containing the twoprecomputed tentative partial decision-feedback values for the twopossible path extensions into ρ_(n−M+1)′ from the different predecessorstates {ρ_(n−M)′}:

U _(n)(ρ_(n−M+1)′)={ũ_(n)(ρ_(n−M) ′,ã_(n−M))},{ρ_(n−M)′}→ρ_(n−M+1).  (20)

[0067] To be able to eventually precompute tentative branch metricsaccording to equation (14), the computations described by equations(18), (19) and (20) must be repeated for time steps n−M+1 to n−1according to the following equations, where M−1≧k≧1:

ũ _(n)(ρ_(n−k) ′,ã _(n−k))=v _(n)(ρ_(n−k)′)−f _(k) ã _(n−k),  (21)

v _(n)(ρ_(n−k+1)′)=sel(U _(n)(ρ_(n−k+1)′),d_(n−k)(ρ_(n−k)′−ρ_(n−l+1)′)),  (22)

U _(n)(ρ_(n−k+1)′)={ũ_(n)(ρ_(n−k) ′,ã_(n−k))},{ρ_(n−k)′}→ρ_(n−k+1).  (23)

[0068] Once ũ_(n)(ρ_(n−1)′,ã_(n−1)) becomes available, tentative branchmetrics {tilde over (λ)}_(n)′(z_(n),a_(n),ρ_(n)′,ã_(n−1)) can beprecomputed according to equation (14) and the appropriate branchmetrics are selected according to equations (15) and (16).

[0069] The architecture for RSSE 900 with look-ahead depth M=3 and theparameters L=4 and K=1 is shown in FIG. 9. It can be seen that in totalM=3 pipeline stages are available. Two pipeline stages 912, 916 havebeen placed inside the DFU 910, and one pipeline stage 925 has beenplaced between the BMU 920 and ACSU 930. The connection network in theDFU 910 resembles the structure of the underlying trellis from FIG. 3,as past decisions from the ACSU 930 are used to extend the partialsurvivor sequences by the subsequent survivor symbol. As the LUTtypically has a delay comparable to an adder, the critical path of thisimplementation is determined by an add-compare in the ACSU 930 (2 ADDs)and the storage of the most recent decision in the SMU 940 or theselection of an appropriate value with a 2-to-1 MUX in the DFU 910 orBMU 920.

[0070] The complexity and critical path of pipelined RSSE usingmultiple-step look-ahead computations is shown in FIG. 10. It can beseen in FIG. 10 that the hardware overhead for performingprecomputations scales only linearly with the look-ahead depth M.Choosing M=L−K as in FIG. 9 leads to an architecture where the criticalpath is reduced to be of the same order as in MLSE (c.f. FIG. 6) anddoes not depend on the channel memory L. In a further variation, theprecomputed partial ISI estimates in the pipelined DFU 910 may beprocessed in groups of taps, with a precomputation for all possible datasymbol combinations corresponding to the groups of taps and selectionfor each group, as would be apparent to a person of ordinary skill inthe art.

[0071]FIGS. 11 and 12 illustrate alternate implementations of the RSSEsshown in FIGS. 8 and 9, respectively, where pipeline registers areplaced differently (now before Mux at stage 1125 and 1225, respectively)using a cut-set or re-timing transformation technique. For a moredetailed discussion of cut-set or re-timing transformation techniques,see P. Pirsch, Architectures for Digital Signal Processing, New York,Wiley (1998), incorporated by reference herein. The present inventionencompasses all derivations that can be achieved using suchtransformations, as would be apparent to a person of ordinary skill inthe art.

Joint Postcursor Equalization and Trellis Decoding for 1000BASE-TGigabit Ethernet

[0072] An exemplary embodiment employs the 1000BASE-T physical layerstandard that specifies Gigabit Ethernet over four pairs of Category 5unshielded twisted pair (UTP) copper cabling, as described in M.Hatamian et al., “Design Considerations for Gigabit Ethernet 1000Base-TTwisted Pair Transceivers,” Proc. IEEE Custom Integrated Circuits Conf.(CICC), Santa Clara, Calif., 335-342 (May 1998); or K. Azadet, “GigabitEthernet Over Unshielded Twisted Pair Cables,” Proc. Int. Symp. VLSITechnology, Systems, Applications (VLSI-TSA), Taipei, 167-170 (June1999). It is noted that hereinafter, all variables will be defined in anew way. Although the meaning of variables used in this second part ofthis detailed description may be related to the definition of thevariables in the previous part of the description, they might notexactly have the same meaning. All variables used hereinafter, however,will be described and defined in a precise way and their meaning isvalid for this second part of the detailed description only.

[0073] The throughput of 1 Gb/s is achieved in 1000BASET by full duplextransmission of pulse amplitude modulated signals with the five levels{−2,−1, 0, 1, 2} (PAM5) resulting in a data rate of 250 Mb/s per wirepair. By grouping four PAM5 symbols transmitted over the four differentwire channels, a four-dimensional (4D) symbol is formed which carrieseight information bits.

[0074] Thus, the symbol rate is 125 Mbaud/s, which corresponds to asymbol period of 8 ns. To achieve a target bit error rate of at lessthan 10⁻¹⁰, the digital signal processor (DSP) section of a 1000BASE-Treceiver must cancel intersymbol interference (ISI), echo and near-endcrosstalk (NEXT). 1000BASE-T improves the noise margin by employingtrellis-coded modulation (TCM). For a detailed discussion oftrellis-coded modulation techniques, see, for example, G. Ungerboeck,“Trellis-Coded Modulation With Redundant Signal Sets, Parts I and II,”IEEE Commun. Mag., Vol. 25, 5-21 (February 1987), incorporated byreference herein.

[0075] For coding purposes, the 1D PAM5 symbols are partitioned into twoone dimensional (1D) subsets A={−1,1} and B={−2,0,2}. By groupingdifferent combinations of the 1D subsets together which are transmittedover the four wire pairs, the eight 4D subsets S0, S1, . . . , S8 areformed. The 8-state, radix-4 code trellis specified by 1000BASE-T isshown in FIG. 13. ρ_(n) in FIG. 13 denotes the state of the trellis codeat time n (i.e., p, is no longer defined by equation (2), as noted atthe beginning of this section). Each transition in the trellis diagram1300 corresponds to one of the specified eight 4D subsets. There are 64parallel transitions per state transition. Due to the 4D subsetpartitioning and labeling of the transitions in the code trellis, theminimum Euclidean distance between allowed sequences is Δ²=4 whichcorresponds to an asymptotic coding gain of 6 dB (10log4) over uncodedPAM5 in an ISI free channel.

[0076] In a 1000BASE-T receiver, feedforward equalizers, echo and NEXTcancellers remove precursor ISI, echo and NEXT respectively. Theremaining DSP processing removes the postcursor ISI, which typicallyspans 14 symbol periods, and decodes the trellis code. It has been shownin E. F. Haratsch, “High-Speed VLSI Implementation of Reduced ComplexitySequence Estimation Algorithms With Application to Gigabit Ethernet1000BASE-T,” Proc. Int. Symp. VLSI Technology, Systems, Applications(VLSI-TSA), Taipei, 171-174 (June 1999) that parallel decision-feedbackdecoding, a special case of reduce-state sequence estimation, M. V.Eyuboglu and S. U. Qureshi, “Reduced-State Sequence Estimation for CodedModulation on Intersymbol Interference Channels,” IEEE J. Sel. AreasCommun., Vol. 7, 989-95 (Aug. 1989), offers the best trade-off for thistask with respect to SNR performance, hardware complexity and criticalpath. However, the integration of a 125 MHz, 14-tap paralleldecision-feedback decoder (PDFD) is quite challenging because of thecritical path problem.

[0077] A simplified postcursor equalization and trellis decodingstructure was presented in E. F. Haratsch and K. Azadet, “A LowComplexity Joint Equalizer and Decoder for 1000BASE-T Gigabit Ethernet,”Proc. IEEE Custom Integrated Circuits Conf. (CICC), Orlando, 465-68 (May2000), where decision-feedback prefilters shorten the postcursor impulseresponse to one postcursor. Then exhaustive precomputation of allpossible 1D branch metrics is possible, substantially reducing thecritical path of the remaining 1-tap PDFD. However, the postcursorequalization and trellis decoding structure suffers from a performancedegradation of 1.3 dB compared to a 14-tap PDFD.

[0078] The present invention thus provides a pipelined 14-tap PDFDarchitecture, which operates at the required processing speed of 125 MHzwithout any coding gain loss. To achieve this, the look-ahead techniquediscussed above for uncoded signals impaired by ISI, where theunderlying trellis has no parallel state transitions, is extended totrellis codes with parallel transitions like the one specified in1000BASE-T. The processing blocks of the disclosed architecture, whichdiffer from a conventional PDFD design, are described below.

Parallel Decision-Feedback Decoding Algorithm

[0079] Parallel decision-feedback decoding combines postcursorequalization with TCM decoding by computing separate ISI estimates foreach code state before applying the well known Viterbi algorithm (see,e.g., G. D. Forney, Jr., “The Viterbi Algorithm,” Proc. IEEE, Vol. 61,268-78 (Mar. 1973)) to decode the trellis code. An ISI estimate for wirepair j and code state ρ_(n) at time n is given by${{u_{n,j}\left( \rho_{n} \right)} = {\sum\limits_{i = 1}^{14}{f_{i,j}{{\hat{a}}_{{n - i},j}\left( \rho_{n} \right)}}}},$

[0080] where {f_(i,j)} are the postcursor channel coefficients for wirepair j and e,cir a_(n−i,j)(ρ_(n)) is the j-th dimension of the 4Dsurvivor symbol â_(n−i)(ρ_(n))=(â_(n−i,1)(ρ_(n)),â_(n−i,2)(ρ_(n)),â_(n−i,3)(ρ_(n)),â_(n−i,4)(ρ_(n)))which belongs to the survivor sequence into ρ_(n) and corresponds totime n−i. As there are eight code states and four wire pairs, 32 ISIestimates are calculated at each decoding step. In a straight-forwardPDFD implementation, the calculation of the ISI estimates in thedecision-feedback unit introduces a recursive loop, which also includesthe branch metric unit (BMU), add-compare-select unit (ACSU) andsurvivor memory unit (SMU). As the clock rate is 125 MHz in 1000BASE-T,there are only 8 ns available for the operations along this criticalpath. As conventional pipelining techniques cannot be applied to improvethroughput due to the recursive nonlinear structure of this loop, it isextremely challenging to implement a 125 MHz, 14-tap PDFD for 1000BASE-T Gigabit Ethernet. When a state-parallel 14-tap PDFD isimplemented using VHDL and synthesis in 3.3 V 0.16 μm standard cell CMOSprocess, the design would only achieve a throughput of approximately 500Mb/s, and the hardware complexity would be 158 kGates. In the following,a pipelined 14-tap PDFD architecture is disclosed that achieves therequired throughput of 1 Gb/s without sacrificing coding gainperformance.

Pipelined 14-Tap PDFD Architecture

[0081] The parallel decision-feedback decoding algorithm wasreformulated above such that pipelining of the computation of the ISIestimates and branch metrics is possible. ISI estimates and branchmetrics are precomputed in a look-ahead fashion to bring the DFU and BMUout of the critical loop (see FIGS. 8 and 9 and correspondingdiscussion). Using ACS decisions to prune the look-head computation treemitigates the exponential growth of the computational complexity withrespect to the look-ahead depth. The above discussion only addressed thecase where parallel decision-feedback decoding or other RSSE variantsare used for equalization (and trellis decoding) of signals impaired byISI, where the underlying trellis has no parallel state transitions.

[0082] In the following discussion, the look-ahead computation conceptdiscussed above is extended to systems where the paralleldecision-feedback decoding algorithm or other RSSE variants are used forequalization and/or trellis decoding where the underlying trellis hasparallel state transitions. In particular, an exemplary pipelined,14-tap PDFD architecture with look-ahead depth two is presented whichmeets the throughput requirement of 1000BASE-T. The present inventioncan be generalized to other look-ahead depths, trellis codes, modulationschemes, RSSE variants and number of postcursor taps, as would beapparent to a person of ordinary skill in the art.

[0083] The disclosed pipelined PDFD architecture, which decodes the1000BASE-T trellis code and equalizes the ISI due to 14 postcursors isshown in FIG. 14. Speculative ISI estimates which are used for the ACSdecisions corresponding to state transitions {ρ_(n+2)}→{ρ_(n+3)} arecomputed in the look-ahead DFU (LA-DFU) 1412 using information alreadyavailable at time n, i.e., two clock cycles ahead of time. Therefore,the look-ahead depth is two. The appropriate ISI estimates are selectedin the ISI-multiplexer unit (ISI-MUXU) 1416 based on ACS decisions (from1440) and survivor symbols (from 1450). Speculative 1D branch metricsare precomputed one decoding step in advance in the 1D-LA-BMU 1424.Again, ACS decisions and survivor symbols are used to select theappropriate 1D branch metrics in the 1D-BM-MNXU 1428. The selected 1Dbranch metrics are added up in the 4D-BMU 1430 to compute the 4D branchmetrics, which correspond to state transitions of the code trellis 1300shown in FIG. 13. The best survivor path for each code state isdetermined in the ACSU 1440, and the eight survivor paths are stored inthe SMU 1450.

[0084] Compared to a conventional PDFD implementation as described in E.F. Haratsch, “High-Speed VLSI Implementation of Reduced ComplexitySequence Estimation Algorithms With Application to Gigabit Ethernet 1000Base-T,” Int'l Symposium on VLSI Technology, Systems, and Applications,Taipei (Jun. 1999), the DFU and 1D-BMU are outside the critical loop, asthere is a pipeline stage 1418 between the DFU and 1D-BMU and anotherpipeline stage 1429 between the 1D-BMU and 4D-BMU. The critical path inthe architecture of FIG. 14 includes only the 4D-BMU 1430, ACSU 1440 andSMU 1450. The contribution of the 1D-BM-MUXU 1428 and ISI-MUXU 1416 tothe critical path is low. Therefore, the proposed PDFD architectureachieves a throughput twice as high as a conventional PDFDimplementation. The proposed PDFD architecture differs from thepipelined structure developed for trellises without parallel statetransitions in FIGS. 8 and 9 with respect to the selection of theappropriate ISI estimates and 1D branch metrics in the ISI-MUXU 1416 and1D-BM-MUXU 1428. As 1000BASE-T employs TCM with parallel statetransitions, not only ACS decisions, but also the most recent survivorsymbols are required for the selection of the appropriate values asthere is not a unique relationship between ACS decisions and survivorsymbols. In the following, the implementation of the DFU 1410, 1D-BMU1420 and SMU 1450 are described in detail. The implementation of the4D-BMU 1430 and ACSU 1440 is the same as in a conventional PDFD and isalready described in E. F. Haratsch and K. Azadet, ““A Low ComplexityJoint Equalizer and Decoder for 1000BASE-T Gigabit Ethernet,” Proc. IEEECustom Integrated Circuits Conf. (CICC), Orlando, 465-468 (May 2000).

Decision-Feedback Unit

[0085] Exhaustive precomputation of ISI estimates is not feasible in1000BASE-T without prefiltering as the number of possible ISI estimatesgrows exponentially with the number of postcursors. As there are 14postcursors, four wire pairs and PAM5 modulation is being used, thereare in total 4×5¹⁴≈2×10¹⁰ possible ISI estimates, which must beprecomputed. Precomputing ISI estimates using a limited look-ahead depthreduces the complexity. The exponential growth of the number ofprecomputed ISI estimates is mitigated as the precomputation is notcompletely decoupled from the ACS and survivor symbol decisions.Survivor symbols available at time n are used for the computation of ISIestimates corresponding to state transitions {ρ_(n+2)}→{ρ_(n+3)}. Then,the look-ahead computation tree is pruned using ACS and survivor symboldecisions available at time n.

[0086] Look-Ahead Computation of ISI Estimates (LA-DFU)

[0087] An estimate v_(n+2,j)(ρ_(n)) for the partial ISI due to thechannel coefficients {f^(3,j), f_(4,j), . . . , f_(14,j)} whichcorresponds to a state transition ρ_(n+2)→ρ_(n+3) can be calculated byusing the symbols from the survivor path into state P_(n) which areavailable at time n:${v_{{n + 2},j}\left( \rho_{n} \right)} = {- {\sum\limits_{i = 1}^{12}{f_{{i + 2},j}{{{\hat{a}}_{{n - i},j}\left( \rho_{n} \right)}.}}}}$

[0088] A speculative partial ISI estimate ũ_(n+2,j)(ρ_(n),ã_(n,j)),which also considers the ISI due to f_(2,j) and assumes that ã_(n,j) isthe 1D symbol for the corresponding transition ρ_(n)→ρ_(n+1) iscalculated as

ũ _(n+2,j)(ρ_(n) ,ã _(n,j))=v _(n+2,j)(ρ_(n))−f _(2,j) ã _(n,j).

[0089] As there are five possibilities for ã_(n,j) due to the PAM5modulation, five different partial ISI estimates must be computed percode state and wire pair in the LA-DFU 1412 as shown in FIG. 15. Intotal, 160 (8×4×5) such ISI estimates are precomputed in the LA-DFU1412.

[0090] Selection of ISI Estimates (ISI-MUXU)

[0091] The appropriate partial ISI estimate v_(n+2,j)(ρ_(n+1)) whichconsiders the symbols from the survivor path into ρ_(n+1) and thechannel coefficients {f_(2,j),f_(3,j), . . . ,f_(14,j)} can be selectedamong the precomputed partial ISI estimates ũ_(n+2,j)(ρ_(n),ã_(n,j))when the best survivor path into state ρ_(n+1) and the corresponding 4Dsurvivor symbol â_(n,j)(ρ_(n+1)) become available. This selection in theISI-MUXU 1416 is shown in FIG. 16 for a particular wire pair j and stateρ_(n+1)=0. The partial ISI estimate v_(n+2,j)(ρ_(n+1)=0) is selectedamong 20 (4×5) precomputed partial ISI estimates{ũ_(n+2,j)(ρ_(n),ã_(n,j))},{ρ_(n)}→ρ_(n+1)=0, as there are the fourcontender paths from the states ρ_(n)=0,2,4 and 6 leading into stateρ_(n+1)=0. Also, for each of these contender paths leading into stateρ_(n+1), five different partial ISI estimates ũ_(n+2,j)(ρ_(n),ã_(n,j))corresponding to different values for ã_(n,j) are possible. As shown inFIG. 16, the selection of the appropriate partial ISI estimatev_(n+2,j)(ρ_(n+1)) is performed in two stages. First, the ACS decisiond_(n)(ρ_(n+1)) selects the five speculative partial ISI estimates, whichcorrespond to the selected survivor path into ρ_(n+1), but assumedifferent values for ã_(n,j). Then, the survivor symbol â_(n,j)(ρ_(n+1))selects the appropriate partial ISI estimate v_(n+2,j)(ρ_(n+1)) whichassumed â_(n,j)(ρ_(n+1)) as value for ã_(n,j). Both d_(n)(ρ_(n+1)) andâ_(n,j)(ρ_(n+1)) become available at the end of the clock cyclecorresponding to state transitions {ρ_(n)}→{ρ_(n+1)}. The output of theISI-MUXU is 32 (8×4) partial ISI estimates v_(n+2,j)(ρ_(n+1)), as thereare eight states and four wire pairs.

1D Branch Metric Unit

[0092] The 1D-BMU 1420 consists of two processing blocks. The 1D-LA-BMU1424 takes the partial ISI estimates {v_(n+1,j)(ρ_(n))} computed in theDFU to calculate speculative 1D branch metrics. In the 1D-BM-MUXU 1428,the appropriate 1D branch metrics are selected using ACS decisions andcorresponding survivor symbols.

[0093] Look-ahead Computation of 1D Branch Metrics (1D-LA-BMU)

[0094] The 1D-LA-BMU 1424 precomputes speculative 1D branch metricswhich are then needed in the 4D-BMU 1430 one clock cycle later. Inputinto the 1D-LA-BMU 1424 are the partial ISI estimates{v_(n+1,j)(ρ_(n))}, which correspond to trellis transitions{ρ_(n+1)}→{ρ_(n+2)}. These ISI estimates consider the channelcoefficients {f_(2,j),f_(3,j), . . . ,f_(14,j)} and the symbols from thesurvivor path into state ρ_(n). A speculative partial ISI estimateũ_(n+1,j)(ρ_(n)a_(n,j)), which also considers the ISI due to the channelcoefficient f_(1,j) and assumes that ã_(n,j) is the 1D symbolcorresponding to the transition ρ_(n)→ρ_(n+1) is given by:

ũ_(n+1,j)(ρ_(n),ã_(n,j))=v _(n+1,j)(ρ_(n))−f ₁ã_(n,j).

[0095] The speculative 1D branch metric for a transition from stateρ_(n+1) under the symbol a_(n+1,j) assuming that the correspondingsurvivor path contains the survivor sequence into state ρ_(n) and isextended by the symbol ã_(n,j) to reach state ρ_(n+1) is given by

{tilde over (λ)}_(n+1,j)(z _(n+1,j) ,a _(n+1,j),ρ_(n),ã_(n,j))=(z_(n+1,j) −a _(n+1,j)+ũ_(n+1,j)(ρ_(n) ,ã _(n,j)))².

[0096] The precomputation of speculative 1D branch metrics for aparticular initial state ρ_(n) and wire pair j is shown in FIG. 17,where the slicers calculate the difference between the slicer input andthe closest symbol in the 1D subsets A and B, respectively. As there arefour wire pairs, eight code states, five possibilities for ã_(n,j) (dueto the PAM5 modulation), and two possibilities for a_(n+1,j) (A-type orB-type 1D symbol), in total 320 (8×4×5×2) different speculative 1Dbranch metrics are precomputed in the 1D-LA-BMU 1424.

[0097] Selection of 1D Branch Metrics (1D-BM-MUXU)

[0098] The appropriate 1D branch metricλ_(n+1,j)(z_(n+1,j),a_(n+1,j),ρ_(n+1)) which corresponds to a transitionfrom state ρ_(n+1) under the 1D symbol a_(n+1) is selected among 4×5=20precomputed 1D branch metrics {tilde over(λ)}n+1,j(z_(n+1,j),a_(n+1,j),ρ_(n),ã_(n,j)) as there are four pathextensions from different states {ρ_(n)} into ρ_(n+1) and fivepossibilities for ã_(n,j) due to the PAM5 modulation. The selection of aparticular λ_(n+1,j)(z_(n+1,j),a_(n+1,j),ρ_(n+1)) in the 1D-BM-MUXU 1428is performed using the same multiplexer structure as shown in FIG. 16.First, the ACS decision d_(n)(ρ_(n+1)) determines the five speculative1D branch metrics, which correspond to the state ρ_(n) being part of thesurvivor path into ρ_(n+1). Then, the survivor symbol â_(n,j)(ρ_(n+1))selects among these five metrics the one which assumed â_(n,j)(ρ_(n+1))as value for ã_(n,j). The 1D-BM-MUXU 1428 selects in total 64 (8×4×2)actual 1D branch metrics, as there are eight states, four wire pairs andthe two 1D subset types A and B.

Survivor Memory Unit

[0099] The merge depth of the exemplary 1000BASE-T trellis code is 14.The SMU must be implemented using the register-exchange architecturedescribed in R. Cypher and C. B. Shung, “Generalized Trace-BackTechniques for Survivor Memory Management in the Viterbi Algorithm,” J.VLSI Signal Processing, Vol. 5, 85-94 (1993), as the survivor symbolscorresponding to the time steps n−12,n−11, . . . ,n are needed in theDFU without delay and the latency budget specified in the 1000BASE-Tstandard is very tight. The proposed register-exchange architecture withmerge depth 14 is shown in FIG. 18, where only the first row storing thesurvivor sequence corresponding to state zero is shown. SX _(n)(ρ_(n))denotes the 4D symbol decision corresponding to 4D subset SX and atransition from state ρ_(n) . The multiplexers in the first columnselect the 4D survivor symbols {â _(n)(ρ_(n+1))}, which are part of thesurvivor path into {ρ_(n+1)}. These 4D survivor symbols are required inthe ISI-MUXU 1416 and 1D-BM-MUXU 1428 to select the appropriate partialISI estimates and 1D branch metrics, respectively. The survivor symbols{â _(n−1)(ρ_(n)),â _(n−2)(ρ_(n), . . . ,â _(n−12)(ρ_(n))} which arestored in the registers corresponding to the first, second, . . . 12thcolumn are used in the LA-DFU 1412 to compute the partial ISI estimatesv_(n+2,j)(ρ_(n))

[0100] It is to be understood that the embodiments and variations shownand described herein are merely illustrative of the principles of thisinvention and that various modifications may be implemented by thoseskilled in the art without departing from the scope and spirit of theinvention.

We claim:
 1. A method for processing a signal received from a dispersivechannel using a reduced-state sequence estimation technique, saidchannel having a channel impulse response, said method comprising thesteps of: precomputing intersymbol interference estimates based on acombination of (i) speculative partial intersymbol interferenceestimates for a first postcursor tap of said channel impulse response,wherein said speculative intersymbol interference estimates are based oneach possible value for a data symbol, and (ii) a combination of partialintersymbol interference estimates for each subsequent postcursor tap ofsaid channel impulse response, wherein at least one of said partialintersymbol interference estimates for said subsequent postcursor tapsis based on a first past decision from a corresponding state;precomputing branch metrics based on said precomputed intersymbolinterference estimates; selecting one of said precomputed branch metricsbased on a second past decision from a corresponding state; computing anew path metric for a path extension from a corresponding state based onsaid selected branch metrics; and determining a best survivor path intoa state by selecting a path having a best new path metric among saidcorresponding computed new path metrics.
 2. The method of claim 1,wherein said partial intersymbol interference estimates equal a channelcoefficient multiplied by a data symbol value.
 3. The method of claim 1,wherein said first or second past decisions from a corresponding stateinclude a survivor symbol.
 4. The method of claim 1, wherein said firstor second past decision from a corresponding state includes anadd-compare select decisions.
 5. The method of claim 1, wherein saidpath metric is an accumulation of said corresponding branch metrics overtime.
 6. The method of claim 1, wherein said best path metric is aminimum or maximum path metric.
 7. The method of claim 1, wherein saidreduced-state sequence estimation technique is selected from the groupconsisting essentially of (i) a decision-feedback sequence estimationtechnique; (ii) a delayed decision-feedback sequence estimationtechnique; or (iii) a parallel decision-feedback decoding technique. 8.The method of claim 1, wherein said method allows said reduced-statesequence estimation technique to be pipelined before or after each ofsaid selections.
 9. The method of claim 1, wherein said signal is amulti-dimensional signal, and transitions in a trellis processed by saidreduced-state sequence estimation technique correspond tomulti-dimensional symbols, wherein said steps of precomputing andselecting branch metrics comprise the steps of: precomputingone-dimensional branch metrics based on said precomputed intersymbolinterference estimates; selecting one of said precomputedone-dimensional branch metric based on a past decision from acorresponding state; and combining said selected one-dimensional branchmetrics to obtain a multi-dimensional branch metric.
 10. The method ofclaim 1, wherein said signal is a multi-dimensional signal, andtransitions in a trellis processed by said reduced-state sequenceestimation technique correspond to multi-dimensional symbols, whereinsaid steps of precomputing and selecting branch metrics comprise thesteps of: precomputing one-dimensional branch metrics based on saidprecomputed intersymbol interference estimates; combining saidone-dimensional branch metrics to precompute at least two-dimensionalbranch metrics; and selecting one of said precomputed at leasttwo-dimensional branch metrics based on a past decision from acorresponding state.
 11. The method of claim 10, wherein said selectionof an appropriate at least two-dimensional branch metrics correspondingto a particular state is based on at least two-dimensional survivorsymbols from a corresponding state.
 12. A method for processing a signalreceived from a dispersive channel using a reduced-state sequenceestimation technique, said channel having a channel impulse response,said method comprising the steps of: precomputing intersymbolinterference estimates based on a combination of (i) speculative partialintersymbol interference estimates for a first postcursor tap of saidchannel impulse response, wherein said speculative intersymbolinterference estimates are based on each possible value for a datasymbol, and (ii) a combination of partial intersymbol interferenceestimates for each subsequent postcursor tap of said channel impulseresponse, wherein at least one of said partial intersymbol interferenceestimates for said subsequent postcursor taps is based on a first pastdecision from a corresponding state; selecting one of said precomputedintersymbol interference estimates based on a second past decision froma corresponding state; computing a branch metric based on said selectedprecomputed intersymbol interference estimates; computing a new pathmetric for a path extension from a corresponding state based on saidcomputed branch metrics; and determining a best survivor path into astate by selecting a path having a best new path metric among saidcorresponding computed new path metrics.
 13. The method of claim 12,wherein said first or second past decisions from a corresponding stateinclude a survivor symbol.
 14. The method of claim 12, wherein saidfirst or second past decision from a corresponding state includes anadd-compare select decisions.
 15. The method of claim 12, wherein saidmethod allows said reduced-state sequence estimation technique to bepipelined before or after each of said selections. Trellis claims: 16.The method of claim 12, wherein said signal is a multi-dimensionalsignal, and transitions in a trellis processed by said reduced-statesequence estimation technique correspond to multi-dimensional symbols,wherein said steps of precomputing and selecting branch metrics comprisethe steps of: computing one-dimensional branch metrics based on saidprecomputed intersymbol interference estimates; selecting one of saidcomputed one-dimensional branch metric based on a past decision from acorresponding state; and combining said selected one-dimensional branchmetrics to obtain a multi-dimensional branch metric.
 17. The method ofclaim 12, wherein said signal is a multi-dimensional signal, andtransitions in a trellis processed by said reduced-state sequenceestimation technique correspond to multi-dimensional symbols, whereinsaid steps of precomputing and selecting branch metrics comprise thesteps of: computing one-dimensional branch metrics based on saidprecomputed intersymbol interference estimates; combining saidone-dimensional branch metrics to obtain at least two-dimensional branchmetrics; and selecting one of said at least two-dimensional branchmetrics based on a past decision from a corresponding state.
 18. Themethod of claim 17, wherein said selection of an appropriate at leasttwo-dimensional branch metrics corresponding to a particular state isbased on at least two-dimensional survivor symbols from a correspondingstate.
 19. A method for processing a signal received from a dispersivechannel using a reduced-state sequence estimation technique, saidchannel having a channel impulse response, said method comprising thesteps of: precomputing partial intersymbol interference estimates foreach of a plurality of postcursor taps of said channel impulse response,wherein said partial intersymbol interference estimates are based oneach possible value for a data symbol; selecting a precomputed partialintersymbol interference estimate for each of said plurality ofpostcursor taps other than a first postcursor tap based on a pastdecision from a corresponding state, wherein a precomputed partialintersymbol interference estimate for a first postcursor tap is aprecomputed intersymbol interference estimate; precomputing branchmetrics based on said precomputed intersymbol interference estimate;selecting one of said precomputed branch metrics based on a pastdecision from a corresponding state; computing a new path metric for apath extension from a corresponding state based on said selected branchmetrics; and determining a best survivor path into a state by selectinga path having a best new path metric among said corresponding computednew path metrics.
 20. The method of claim 19, wherein said past decisionfrom a corresponding state include a survivor symbol.
 21. The method ofclaim 19, wherein said past decision from a corresponding state includesan add-compare select decision.
 22. The method of claim 19, wherein saidstep of precomputing partial intersymbol interference estimates for agiven postcursor tap further comprises the step of combining (i) apartial intersymbol interference estimate for a previous postcursor tap,and (ii) a product of a channel coefficient value and a possible symbolvalue.
 23. The method of claim 22, wherein said partial intersymbolinterference estimate for a previous postcursor tap is a selectedprecomputed partial intersymbol interference estimate.
 24. The method ofclaim 22, wherein said partial intersymbol interference estimate for aprevious postcursor step is a non-speculative partial intersymbolinterference estimate.
 25. The method of claim 24, wherein saidnon-speculative partial intersymbol interference estimate is based on apast decision from a corresponding state.
 26. The method of claim 24,wherein said non-speculative partial intersymbol interference estimateis based on a data symbol associated with a corresponding state.
 27. Themethod of claim 19, wherein said reduced-state sequence estimationtechnique is selected from the group consisting essentially of (i) adecision-feedback sequence estimation technique; (ii) a delayeddecision-feedback sequence estimation technique; or (iii) a paralleldecision-feedback decoding technique.
 28. The method of claim 19,wherein said method allows said reduced-state sequence estimationtechnique to be pipelined be fore or after each of said selections. 29.The method of claim 19, wherein said first selecting step comprises thestep of selecting a precomputed partial intersymbol interferenceestimate for a group of said plurality of postcursor taps other than afirst group of postcursor taps based on past decisions, wherein aprecomputed partial intersymbol interference estimate for a first groupof postcursor taps is a precomputed intersymbol interference estimate.30. The method of claim 29, wherein said step of precomputing saidpartial intersymbol interference estimates for a given group ofpostcursor taps further comprises the step of combining a (i) partialintersymbol interference estimate for a previous group of postcursortaps, and (ii) a combination of products of channel coefficients andpossible symbol values for each tap of said plurality of postcursortaps.
 31. A method for processing a signal received from a dispersivechannel using a reduced-state sequence estimation technique, saidchannel having a channel impulse response, said method comprising thesteps of: precomputing partial intersymbol interference estimates foreach of a plurality of postcursor taps of said channel impulse response,wherein said precomputed partial intersymbol interference estimates arebased on each possible value for a data symbol; selecting a precomputedpartial intersymbol interference estimate for said plurality ofpostcursor taps based on a past decision from a corresponding state,wherein a selected partial intersymbol interference estimate for a firstpostcursor tap is a selected intersymbol interference estimate;computing a branch metric based on said selected intersymbolinterference estimate; selecting one of said precomputed branch metricsbased on a past decision from a corresponding state; computing a newpath metric for a path extension from a corresponding state based onsaid selected branch metrics; and determining a best survivor path intoa state by selecting a path having a best new path metric among saidcorresponding computed new path metrics.
 32. The method of claim 31,wherein said signal is a multi-dimensional signal, and transitions in atrellis processed by said reduced-state sequence estimation techniquecorrespond to multi-dimensional symbols, wherein said steps ofprecomputing and selecting branch metrics comprise the steps of:precomputing one-dimensional branch metrics based on said precomputedintersymbol interference estimates; selecting one of said precomputedone-dimensional branch metric based on a past decision from acorresponding state; and combining said corresponding selectedone-dimensional branch metrics to obtain a multi-dimensional branchmetric.
 33. The method of claim 31, wherein said signal is amulti-dimensional signal, and transitions in a trellis processed by saidreduced-state sequence estimation technique correspond tomulti-dimensional symbols, wherein said steps of precomputing andselecting branch metrics comprise the steps of: precomputingone-dimensional branch metrics based on said precomputed intersymbolinterference estimates; combining said one-dimensional branch metric toprecompute at least two-dimensional branch metrics; and selecting one ofsaid precomputed at least two-dimensional branch metrics based on a pastdecision from a corresponding state.
 34. The method of claim 33, whereinsaid selection of an appropriate at least two-dimensional branch metricscorresponding to a particular state is based on at least two-dimensionalsurvivor symbols from a corresponding state.
 35. The method of claim 31,wherein said first selecting step comprises the step of selecting aprecomputed partial intersymbol interference estimate for a group ofsaid plurality of postcursor taps based on past decisions, wherein aselected partial intersymbol interference estimate for a first group ofpostcursor taps is a selected intersymbol interference estimate.
 36. Themethod of claim 35, wherein said step of precomputing said partialintersymbol interference estimates for a given group of postcursor tapsfurther comprises the step of combining a (i) partial intersymbolinterference estimate for a previous group of postcursor taps, and (ii)a combination of products of channel coefficients and possible symbolvalues for each tap of said plurality of postcursor taps.
 37. Areduced-state sequence estimator for processing a signal received from adispersive channel having a channel impulse response, comprising: adecision feedback unit for precomputing intersymbol interferenceestimates based on a combination of (i) speculative partial intersymbolinterference estimates for a first postcursor tap of said channelimpulse response, wherein said speculative intersymbol interferenceestimates are based on each possible value for a data symbol, and (ii) acombination of partial intersymbol interference estimates for eachsubsequent postcursor tap of said channel impulse response, wherein atleast one of said partial intersymbol interference estimates for saidsubsequent postcursor taps is based on a first past decision from acorresponding state; a branch metrics unit for precomputing branchmetrics based on said precomputed intersymbol interference estimates; amultiplexer for selecting one of said precomputed branch metrics basedon a second past decision from a corresponding state; anadd-compare-select unit for computing a new path metric for a pathextension from a corresponding state based on said selected branchmetrics and determining a best survivor path into a state by selecting apath having a best new path metric among said corresponding computed newpath metrics; and a set of pipeline registers to perform saidreduced-state sequence estimation in two stages.
 38. A reduced-statesequence estimator for processing a signal received from a dispersivechannel having a channel impulse response, comprising: a decisionfeedback unit for precomputing intersymbol interference estimates basedon a combination of (i) speculative partial intersymbol interferenceestimates for a first postcursor tap of said channel impulse response,wherein said speculative intersymbol interference estimates are based oneach possible value for a data symbol, and (ii) a combination of partialintersymbol interference estimates for each subsequent postcursor tap ofsaid channel impulse response, wherein at least one of said partialintersymbol interference estimates for said subsequent postcursor tapsis based on a first past decision from a corresponding state; amultiplexer for selecting one of said precomputed intersymbolinterference estimates based on a second past decision from acorresponding state; a branch metrics unit for computing a branch metricbased on said selected precomputed intersymbol interference estimates;an add-compare-select unit for computing a new path metric for a pathextension from a corresponding state based on said selected branchmetrics and determining a best survivor path into a state by selecting apath having a best new path metric among said corresponding computed newpath metrics; and a set of pipeline registers to perform saidreduced-state sequence estimation in two stages.
 39. A reduced-statesequence estimator for processing a signal received from a dispersivechannel having a channel impulse response, comprising: a decisionfeedback unit for precomputing partial intersymbol interferenceestimates for each of a plurality of postcursor taps of said channelimpulse response, wherein said partial intersymbol interferenceestimates are based on each possible value for a data symbol; amultiplexer for selecting a precomputed partial intersymbol interferenceestimate for each of said plurality of postcursor taps other than afirst postcursor tap based on a past decision from a correspondingstate, wherein a precomputed partial intersymbol interference estimatefor a first postcursor tap is a precomputed intersymbol interferenceestimate; a branch metrics unit for precomputing branch metrics based onsaid precomputed intersymbol interference estimate; a multiplexer forselecting one of said precomputed branch metrics based on a pastdecision from a corresponding state; an add-compare-select unit forcomputing a new path metric for a path extension from a correspondingstate based on said selected branch metrics and determining a bestsurvivor path into a state by selecting a path having a best new pathmetric among said corresponding computed new path metrics; and at leastone set of pipeline registers to perform said reduced-state sequenceestimation in at least two stages.
 40. A reduced-state sequenceestimator for processing a signal received from a dispersive channelhaving a channel impulse response, comprising: a decision feedback unitfor precomputing partial intersymbol interference estimates for each ofa plurality of postcursor taps of said channel impulse response, whereinsaid precomputed partial intersymbol interference estimates are based oneach possible value for a data symbol; a multiplexer for selecting aprecomputed partial intersymbol interference estimate for said pluralityof postcursor taps based on a past decision from a corresponding state,wherein a selected partial intersymbol interference estimate for a firstpostcursor tap is a selected intersymbol interference estimate; a branchmetrics unit for computing a branch metric based on said selectedintersymbol interference estimate; a multiplexer for selecting one ofsaid precomputed branch metrics based on a past decision from acorresponding state; an add-compare-select unit for computing a new pathmetric for a path extension from a corresponding state based on saidselected branch metrics and determining a best survivor path into astate by selecting a path having a best new path metric among saidcorresponding computed new path metrics; and at least one set ofpipeline registers to perform said reduced-state sequence estimation inat least two stages.